library(rgeos)
library(rgdal)
library(sp)
library(maptools)
library(spatstat)
library(spdep)
library(INLA)
library(inlabru)
library(readxl)
library(lubridate)
library(ggmap)
library(raster)
Next, lets check our current working directory!
getwd()
## [1] "/Users/joe/ownCloud/DFO_Workshop_2020/DFO_SDM_Workshop_2020"
We should be inside the folder titled: ‘DFO_SDM_Workshop_2020’. If not, change this by navigating to the correct folder in the bottom right panel (folder view), opening it, and then clicking “Set as Working Directory” under the tab ‘More’.
Finally, let’s turn off all warnings associated with coordinate reference systems.
rgdal::set_rgdal_show_exportToProj4_warnings(FALSE)
rgdal::set_thin_PROJ6_warnings(TRUE)
options("rgdal_show_exportToProj4_warnings"="none")
Now we can load in the precompiled data and functions
load('./Data/Compiled_Data_2.RData')
source('utility_functions.R')
We will be fitting log-Gaussian Cox process models to the sightings data. This assumes that, conditional upon the density surface \(\lambda_{obs}(\textbf{s})\), the sighting locations are independent.
Planned surveys can be designed to (approximately) satisfy the independence assumption. The boat speed can be fixed at a high value relative to the individuals’ speeds and a narrow perpendicular field-of-view can be enforced. Then, assuming that the individuals of the target species are in equilibrium with respect to their stationary distribution \(\lambda_{true}(\textbf{s})\) (i.e. the species’ density) and locally mixing faster than the time between revisits by observers, the independence assumption may reasonably hold.
Conversely, no such assumptions can be made regarding the whale watch sightings. The unknown positions of the whale watch boats, the 360 degree fields-of-view on-board, the variable boat speed, and the repeated sightings of the same individuals throughout each day can all invalidate the independence assumption required to fit a log-Gaussian Cox process.
In these workshops, we will see some potential solutions to address this issue of independence. In this session, we will perform some exploratory analyses to investigate the suitability of the independence assumption, and to investigate possible ways of estimating observer effort.
Lets look at the whale watch data. Specifically, let’s investigate the 2nd, 3rd and 4th sightings in the dataset. Notice that they were made on the same day and very close together in time. Often, little information is provided on the database management procedures. For example, are these multiple encounters with the same individual? Would the database discard or retain repeated sightings?
Sightings_DRWW_sp@data[2:4,c('WS_TIME','WS_DATE','PLATFORM')]
## # A tibble: 3 x 3
## WS_TIME WS_DATE PLATFORM
## <Period> <dttm> <chr>
## 1 16H 5M 0S 2007-06-19 00:00:00 BRIER ISLAND WHALEWATCH
## 2 16H 23M 0S 2007-06-19 00:00:00 BRIER ISLAND WHALEWATCH
## 3 16H 32M 0S 2007-06-19 00:00:00 BRIER ISLAND WHALEWATCH
Without additional information available on the database management protocols, we can investigate whether these sightings could be of the same individual. To investigate this hypothesis, we can compute the distance between the sightings, the time between the sightings, and thus the required travel speed of the individual under the assumed hypothesis.
Sightings_DRWW_sp@data[2:4,c('WS_TIME','WS_DATE','PLATFORM')]
## # A tibble: 3 x 3
## WS_TIME WS_DATE PLATFORM
## <Period> <dttm> <chr>
## 1 16H 5M 0S 2007-06-19 00:00:00 BRIER ISLAND WHALEWATCH
## 2 16H 23M 0S 2007-06-19 00:00:00 BRIER ISLAND WHALEWATCH
## 3 16H 32M 0S 2007-06-19 00:00:00 BRIER ISLAND WHALEWATCH
# How far away are these sightings in space (in kilometers)?
gDistance(Sightings_DRWW_sp[2:4,], byid = T)
## 1 2 3
## 1 0.000000 1.316181 3.637082
## 2 1.316181 0.000000 2.386592
## 3 3.637082 2.386592 0.000000
# How far away are these sightings in time (in minutes)?
Sightings_DRWW_sp[3:4,]$WS_TIME-Sightings_DRWW_sp[2:3,]$WS_TIME
## [1] "18M 0S" "9M 0S"
# Could this be the same animal? How fast is this implied movement?
gDistance(Sightings_DRWW_sp[2:4,], byid = T)[cbind(c(1,2),c(2:3))] /
(c(18, 9)/60)
## [1] 4.38727 15.91061
This hypothesis would imply a traveling speed of between 4km/h and 16km/h. Is this plausible?
To be safe moving forward, we will discard all repeated sightings each day, keeping only the first sighting made each day by each company. Note that there is no overlap between the sighting locations from the two whale watch companies, indicating that an assumption of independent encounters between the two companies could be reasonable.
Sightings_DRWW_sp$PLATFORM <- as.factor(Sightings_DRWW_sp$PLATFORM)
ggplot() + gg(Domain) + gg(Sightings_DRWW_sp, colour=Sightings_DRWW_sp$PLATFORM_CODE)
Thus we subset the data to keep only the initial sightings:
Sightings_Brier_nodup <- Sightings_DRWW_sp[Sightings_DRWW_sp$PLATFORM=='BRIER ISLAND WHALEWATCH',]
Sightings_Brier_nodup <- Sightings_Brier_nodup[!duplicated(Sightings_Brier_nodup$WS_DATE),]
Sightings_Quoddy_nodup <- Sightings_DRWW_sp[Sightings_DRWW_sp$PLATFORM=='QUODDY LINK',]
Sightings_Quoddy_nodup <- Sightings_Quoddy_nodup[!duplicated(Sightings_Quoddy_nodup$WS_DATE),]
dim(Sightings_DRWW_sp)[1]; dim(Sightings_Brier_nodup)[1]; dim(Sightings_Quoddy_nodup)[1];
## [1] 567
## [1] 85
## [1] 144
Notice that we have a total of 229 whale watch sightings after the subsetting procedure, down from an original count of 567 sightings.
Strictly speaking, for a log-Gaussian Cox process model to be reasonable, it is required that the initial daily encounter locations from the whale watch vessels are independent realizations from \(\lambda_{true}\). We will assume that the typical journeys/routes of the whale watch vessels remains constant across months and years under study. Thus, we assume that \(\lambda_{eff}\) is constant through time.
Finally, as we discussed earlier, we are unable to model temporal changes in the species density due to the spatially non-overlapping survey efforts across the months and years. Thus any changes in the species density from June - August, or across the years 2007-2009 will be missed. Thus, we are forced to assume that the summer species density remains constant between 2007-2009. It is the summer species density that is our target for inference.
How reasonable is our assumption that the species density remains constant across the years and across the months? Again, we can visually assess the suitability of the assumption with a plot.
If each sighting is indeed an independent realisation from a temporally static point process, then there should be no association between the spatial distances between the whale watch sightings and how far apart in time they were made.
So, we compute the distances in time and the euclidean distances in space between each sighting and plot time and space on the x and y axes respectively.
difftime_Brier<- dist(as.numeric(ymd(Sightings_Brier_nodup$WS_DATE)))
diffspace_Brier <- dist(Sightings_Brier_nodup@coords)
difftime_Brier_fac <- as.factor(difftime_Brier)
difftime_Quoddy<- dist(as.numeric(ymd(Sightings_Quoddy_nodup$WS_DATE)))
diffspace_Quoddy <- dist(Sightings_Quoddy_nodup@coords)
difftime_Quoddy_fac <- as.factor(difftime_Quoddy)
ggplot(data=data.frame(difftime = as.numeric(difftime_Brier)[as.numeric(difftime_Brier)<85],
diffspace = as.numeric(diffspace_Brier)[as.numeric(difftime_Brier)<85]),
aes(x=difftime, y=diffspace)) +
geom_point() + geom_smooth(method='loess') +
xlab('difference in time (days)') +
ylab('difference in space (km)')
ggplot(data=data.frame(difftime = as.numeric(difftime_Brier)[],
diffspace = as.numeric(diffspace_Brier)[]),
aes(x=difftime, y=diffspace)) +
geom_point() + geom_smooth(method='loess')+
xlab('difference in time (days)') +
ylab('difference in space (km)')
ggplot(data=data.frame(difftime =
difftime_Brier_fac[as.numeric(difftime_Brier)<85],
diffspace = as.numeric(diffspace_Brier)[as.numeric(difftime_Brier)<85]),
aes(y=diffspace, group=difftime)) +
geom_boxplot() +
xlab('difference in time (days)') +
ylab('difference in space (km)')
These plots both have days on the x-axis and km on the y-axis, with the first plot restricted to only consider temporal differences of less than 85 days.
The first plot indicates some moderate autocorrelation between the sightings, with encounters made ~1 month apart typically being around 50% further away than those made 24 hours apart. Notice that the trend plateaus after ~14 days. Perhaps this trend is due to the autocorrelation being caused exclusively by persistence in animal movement? Furthermore, the flat-line trend between day ~30 and day ~75 could be evidence in favour of an approximately constant density across the summer months. If the space use did indeed change dramatically across the months (e.g. due to migration), then we would expect to see a monotonically increasing trend over time.
The second plot shows no evidence that the average spatial separation between whale watch sightings differs year-to-year. This supports our assumption that the spatial density can be assumed to be constant across the years, albeit within the (very) small spatial region visited by the whale watch boats departing from Brier Island.
We now repeat the plots, but for the whale watch boats that depart from Quoddy.
ggplot(data=data.frame(difftime = as.numeric(difftime_Quoddy)[as.numeric(difftime_Quoddy)<85],
diffspace = as.numeric(diffspace_Quoddy)[as.numeric(difftime_Quoddy)<85]),
aes(x=difftime, y=diffspace)) +
geom_point() + geom_smooth(method='loess')
ggplot() + gg(Domain) + gg(Sightings_DRWW_sp, colour=Sightings_DRWW_sp$PLATFORM_CODE)
ggplot(data=data.frame(difftime = as.numeric(difftime_Quoddy)[as.numeric(difftime_Quoddy)<85 & as.numeric(diffspace_Quoddy)<50],
diffspace = as.numeric(diffspace_Quoddy)[as.numeric(difftime_Quoddy)<85 & as.numeric(diffspace_Quoddy)<50]),
aes(x=difftime, y=diffspace)) +
geom_point() + geom_smooth(method='loess')
ggplot(data=data.frame(difftime =
difftime_Quoddy_fac[as.numeric(difftime_Quoddy)<85 & as.numeric(diffspace_Quoddy)<50],
diffspace = as.numeric(diffspace_Quoddy)[as.numeric(difftime_Quoddy)<85 & as.numeric(diffspace_Quoddy)<50]),
aes(y=diffspace, group=difftime)) +
geom_boxplot() +
xlab('difference in time (days)') +
ylab('difference in space (km)')
In the first plot we see a large separation in the y-axis. This is caused by a cluster of sightings made way East of port. This is seen in the second plot as the right-most cluster of blue points. To stop this from impacting our estimate of trend, we restrict the spatial differences and times to be those within each of the two clusters of sightings. The third plot displays the results from doing this. Once again, we see some potential evidence of autocorrelation upto ~14 days. This could be due to persistence of animal movement.
We have seen some evidence that residual autocorrelations remain in the sightings data. This is despite subsetting the data to the initial daily sightings. The result of this may be overdispersion. We may be able to partially correct for this by using of a log-Gaussian Cox process. By including latent effects (e.g. spatial fields) in the model, we can hopefully capture some of this additional clustering. in fact, log-Gaussian Cox processes are always overdispersed relative to Poisson processes.
So, we continue on with our exploratory analysis. We now subset the data into a training and test set. We choose the training set to contain all the sightings from 2007-2009. For the test data, we choose all the sightings from 2011.
Could we not also put 2011’s whale watch data into the training set? Below is some (optional) code showing that we cannot. A whale is sighted by at least one whale watch company on every day in 2011 that a survey detects a whale. The plot below shows that these sightings could have been of the same animal. Even if they weren’t, the tracklines in 2011 still visited areas that the whale watch vessels were present. Since our training and test datasets are required to be independent of each other, we must therefore also remove the 2011 whale watch sightings from the training data.
unique(Sightings_survey$DATE_LO[Sightings_survey$YEAR==2011])
## [1] "2011-06-21" "2011-08-12" "2011-08-13"
sum(grepl(Sightings_survey$DATE_LO[Sightings_survey$YEAR==2011][1],
x=c(Sightings_Quoddy_nodup$WS_DATE[Sightings_Quoddy_nodup$YEAR==2011],
Sightings_Brier_nodup$WS_DATE[Sightings_Brier_nodup$YEAR==2011])))>0
## [1] TRUE
sum(grepl(Sightings_survey$DATE_LO[Sightings_survey$YEAR==2011][2],
x=c(Sightings_Quoddy_nodup$WS_DATE[Sightings_Quoddy_nodup$YEAR==2011],
Sightings_Brier_nodup$WS_DATE[Sightings_Brier_nodup$YEAR==2011])))>0
## [1] TRUE
sum(grepl(Sightings_survey$DATE_LO[Sightings_survey$YEAR==2011][3],
x=c(Sightings_Quoddy_nodup$WS_DATE[Sightings_Quoddy_nodup$YEAR==2011],
Sightings_Brier_nodup$WS_DATE[Sightings_Brier_nodup$YEAR==2011])))>0
## [1] TRUE
ggplot() + gg(Domain) +
gg(Sightings_survey[Sightings_survey$YEAR==2011,],colour='green') +
gg.spatiallines_mod(Effort_survey[Effort_survey$YEAR==2011,],colour='yellow')
Below we split the data into training and test data.
Sightings_Brier_nodup_test <- Sightings_Brier_nodup[Sightings_Brier_nodup$YEAR==2011,]
Sightings_Brier_nodup <- Sightings_Brier_nodup[Sightings_Brier_nodup$YEAR!=2011,]
Sightings_Quoddy_nodup_test <- Sightings_Quoddy_nodup[Sightings_Quoddy_nodup$YEAR==2011,]
Sightings_Quoddy_nodup <- Sightings_Quoddy_nodup[Sightings_Quoddy_nodup$YEAR!=2011,]
Sightings_survey_test <- Sightings_survey[Sightings_survey$YEAR==2011,]
Sightings_survey <- Sightings_survey[Sightings_survey$YEAR!=2011,]
Effort_survey_test <- Effort_survey[Effort_survey$YEAR=='2011',]
Effort_survey <- Effort_survey[Effort_survey$YEAR!='2011',]
# How many survey sightings by year?
table(Sightings_survey$YEAR)
##
## 2007 2008 2009
## 45 17 1
Next, we evaluate a crude estimate of the total amount of survey effort spent each year, as measured by the total trackline length. Using this, we then crudely estimate a catch per unit effort measure by year.
# How much survey effort (in trackline length) is there by year
by(gLength(Effort_survey,byid = T), Effort_survey$YEAR, sum)
## Effort_survey$YEAR: 2007
## [1] 11643.15
## ------------------------------------------------------------
## Effort_survey$YEAR: 2008
## [1] 571.9083
## ------------------------------------------------------------
## Effort_survey$YEAR: 2009
## [1] 819.9337
# Crude estimate of relative `CPUE'
(table(Sightings_survey$YEAR) /
by(gLength(Effort_survey,byid = T), Effort_survey$YEAR, sum))/
min(table(Sightings_survey$YEAR) /
by(gLength(Effort_survey,byid = T), Effort_survey$YEAR, sum))
##
## 2007 2008 2009
## 3.168989 24.372565 1.000000
# How many WW sightings by year
table(Sightings_Quoddy_nodup$YEAR)
##
## 2007 2008 2009
## 40 24 44
table(Sightings_Brier_nodup$YEAR)
##
## 2007 2008 2009
## 24 19 19
The ‘CPUE’ values are highly variable! The wild differences could be due to the small efforts in years 2008 and 2009 relative to 2007. Alternatively, the differences could actually reflect real differences in whale density across the three distinct regions visited each year! Without taking a model-based approach, it is unclear how to combine the sightings from these three surveys.
Conversely, by taking a model-based approach to inference, we are able to combine the data from all three surveys to estimate the whale density across space.
ggplot() +
gg(Domain) +
gg.spatiallines_mod(Effort_survey[Effort_survey$YEAR=='2007',], colour='red') +
gg.spatiallines_mod(Effort_survey[Effort_survey$YEAR=='2008',], colour='black') +
gg.spatiallines_mod(Effort_survey[Effort_survey$YEAR=='2009',], colour='blue') +
gg.spatiallines_mod(Effort_survey_test, colour='purple')
Note that the 2011 data (in purple) overlap with all the previous years and WW data. 2011 therefore provides the perfect test dataset for comparing models. The 3 years of surveys were conducted under 3 different surveys following similar (but different) protocols. Unfortunately, due to the lack of spatial overlap between the surveys’ efforts (tracklines), we are unable to account for any differences between the survey protocols within a model without strong prior information available. For example, an intercept added to the model for each survey would be confounded by the spatial field.
xtabs(~DATASET+YEAR,Effort_survey@data)
## YEAR
## DATASET 2007 2008 2009
## DFO 49 0 0
## NOAA_1 3 20 19
## NOAA_2 49 0 0
We now investigate trends between the whale density and the covariates bathymetry and Slope. A simple first step is to evaluate the empirical densities of each covariate at both the encounter locations and across the domain. One easy and quick way to do this is to convert the SpatialPixelsDataFrame objects into objects of class im from the spatstat package. Then, after creating a ppp point pattern object containing the sightings locations from all sources, we can easily extract the values of each covariate at each point location. Alternatively, we can use the function over() from the sp package. We choose to create im objects so that we can later use the ppm function from spatstat for exploratory analysis purposes. ppm fits Poisson process models, equivalent to Maxent models.
# Convert covariates to im format
Slope_im <- as(as(Slope, 'SpatialGridDataFrame'),'im')
Bathym_im <- as(as(Bathym, 'SpatialGridDataFrame'),'im')
# Convert Observation locations to ppp object
All_obs_ppp <- ppp(x=c(Sightings_survey@coords[,1],
Sightings_Brier_nodup@coords[,1],
Sightings_Quoddy_nodup@coords[,1]),
y=c(Sightings_survey@coords[,2],
Sightings_Brier_nodup@coords[,2],
Sightings_Quoddy_nodup@coords[,2]),
marks = as.factor(c(rep('survey',dim(Sightings_survey@coords)[1]),
rep('Brier',dim(Sightings_Brier_nodup@coords)[1]),
rep('Quoddy',dim(Sightings_Quoddy_nodup@coords)[1]))),
window = as(Domain, 'owin'))
# Notice how easy it is to extract the values of slope at these locations!
Slope_obs <- Slope_im[All_obs_ppp]
ggplot(data=data.frame(Slope_obs = Slope_obs),
aes(x=Slope_obs)) +
geom_density() +
geom_density(data=data.frame(Slope_dom = as.numeric(Slope_im$v)),
aes(x=Slope_dom), colour='red') + xlab('Slope')
# Perhaps a log transform is desirable here to make the covariate more 'normal'
ggplot(data=data.frame(Slope_obs = log(Slope_obs)),
aes(x=Slope_obs)) +
geom_density() +
geom_density(data=data.frame(Slope_dom = log(as.numeric(Slope_im$v))),
aes(x=Slope_dom), colour='red') + xlab('log Slope')
# Perhaps there is evidence of whales preferring areas of higher slope?
Bathym_obs <- Bathym_im[All_obs_ppp]
ggplot(data=data.frame(Bathym_obs = Bathym_obs),
aes(x=Bathym_obs)) +
geom_density() +
geom_density(data=data.frame(Bathym_dom = as.numeric(Bathym_im$v[Bathym_im$v<0])),
aes(x=Bathym_dom), colour='red') + xlab('Bathymetry (m)')
# Perhaps a log transform is desired here. Max bathym is 313
# Lets only consider values in water
# Flip the sign to make it depth (larger is deeper)
ggplot(data=data.frame(Bathym_obs = log(1-Bathym_obs)),
aes(x=Bathym_obs)) +
geom_density() +
geom_density(data=data.frame(Bathym_dom = log(as.numeric(1-Bathym_im$v[Bathym_im$v<0]))),
aes(x=Bathym_dom), colour='red') +
geom_vline(xintercept = 0, colour='blue') + xlab('log Depth')
# Perhaps there is evidence of whales preferring areas of shallower waters?
For each covariate, we plot the empirical density at the sightings locations in black and at the background in red. From this, we see that sightings locations are associated with regions with shallower waters and flatter slopes compared with the domain as a whole. However, these results may simply reflect the types of regions preferentially visited by observers!
Next, we could go one step further and fit Poisson point process models to the sightings data. These are equivalent to Maxent models. Ignoring effort for the time being, what does a Maxent model tell us?
log_Slope_im <- Slope_im
log_Slope_im$v <- log(log_Slope_im$v)
log_Bathym_im <- Bathym_im
log_Bathym_im[log_Bathym_im >= 0] <- -1
log_Bathym_im$v <- log(1-log_Bathym_im$v)
Maxent_mod <- ppm(All_obs_ppp~log_Slope_im++log_Bathym_im,
data=list(log_Slope_im=log_Slope_im, log_Bathym_im=log_Bathym_im,
Domain=as(Domain, 'owin')),
subset = Domain)
summary(Maxent_mod)
## Point process model
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.formula(Q = All_obs_ppp ~ log_Slope_im + +log_Bathym_im,
## data = list(log_Slope_im = log_Slope_im, log_Bathym_im = log_Bathym_im,
## Domain = as(Domain, "owin")), subset = Domain)
## Edge correction: "border"
## [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
##
## Data pattern:
## Marked planar point pattern: 232 points
## Average intensity 0.000633 points per square unit
## Multitype:
## frequency proportion intensity
## Brier 62 0.267 0.000169
## Quoddy 107 0.461 0.000292
## survey 63 0.272 0.000172
##
## Window: polygonal boundary
## 2 separate polygons (1 hole)
## vertices area relative.area
## polygon 1 1519 366946.00 1.000000
## polygon 2 (hole) 38 -179.42 -0.000489
## enclosing rectangle: [112.6639, 1141.4561] x [4451.802, 5259.268] units
## (1029 x 807.5 units)
## Window area = 366766 square units
## Fraction of frame area: 0.442
##
## Dummy quadrature points:
## 64 x 64 grid of dummy points, plus 4 corner points
## dummy spacing: 16.07488 x 12.61666 units
##
## Original dummy parameters: =
## Marked planar point pattern: 6191 points
## Average intensity 0.0169 points per square unit
## Multitype:
## frequency proportion intensity
## Brier 2080 0.336 0.00567
## Quoddy 2030 0.329 0.00555
## survey 2080 0.336 0.00567
##
## Window: polygonal boundary
## 2 separate polygons (1 hole)
## vertices area relative.area
## polygon 1 1519 366946.00 1.000000
## polygon 2 (hole) 38 -179.42 -0.000489
## enclosing rectangle: [112.6639, 1141.4561] x [4451.802, 5259.268] units
## (1029 x 807.5 units)
## Window area = 366766 square units
## Fraction of frame area: 0.442
## Quadrature weights:
## (counting weights based on 64 x 64 array of rectangular tiles)
## All weights:
## range: [2.6, 203] total: 1100000
## Weights on data points:
## range: [2.6, 101] total: 6790
## Weights on dummy points:
## range: [2.6, 203] total: 1090000
## --------------------------------------------------------------------------------
## FITTED MODEL:
##
## Nonstationary multitype Poisson process
## Possible marks:
## Brier Quoddy survey
## ---- Intensity: ----
##
## Log intensity: ~log_Slope_im + +log_Bathym_im
## Model depends on external covariates 'log_Slope_im' and 'log_Bathym_im'
## Covariates provided:
## log_Slope_im: im
## log_Bathym_im: im
## Domain: owin
##
## Fitted trend coefficients:
## (Intercept) log_Slope_im log_Bathym_im
## -4.2918939 0.6780343 -0.7629490
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) -4.2918939 0.17472036 -4.6343395 -3.9494483 *** -24.56436
## log_Slope_im 0.6780343 0.05974179 0.5609425 0.7951260 *** 11.34941
## log_Bathym_im -0.7629490 0.03610715 -0.8337177 -0.6921802 *** -21.13013
##
## ----------- gory details -----
##
## Fitted regular parameters (theta):
## (Intercept) log_Slope_im log_Bathym_im
## -4.2918939 0.6780343 -0.7629490
##
## Fitted exp(theta):
## (Intercept) log_Slope_im log_Bathym_im
## 0.01367899 1.97000142 0.46628933
We find some evidence that the whales prefer regions of higher slope and shallower waters.
Next, we can attempt to make a crude adjustment for effort and see if these estimates change. We have alreads created a SpatialPixelsDataFrame object containing distance from the two whale watch ports. Next, we can create a SpatialPixelsDataFrame containing the closest distance from the effort lines. This fails to account for overlap in survey tracklines.
Dist_Surveylines <- as(Dist_Brier,'SpatialPoints')
Dist_Surveylines$Dist_surveylines <- as.numeric(apply(gDistance(Dist_Surveylines, Effort_survey,byid = T),2,min))
Dist_Surveylines <- as(Dist_Surveylines,'SpatialPixelsDataFrame')
ggplot() + gg(Domain) + gg(Dist_Surveylines)
## Regions defined for each Polygons
# What is the maximum detection distance?
ggplot(Sightings_survey@data, aes(x=DISTANCE)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1 rows containing non-finite values (stat_bin).
# Let's add a half-norm detection probability function, capped at 5km
Dist_Surveylines$Dist_surveylines[Dist_Surveylines$Dist_surveylines > 5000] <- 1e3
# need to square the value
Dist_Surveylines$Dist_surveylines <- scale(Dist_Surveylines$Dist_surveylines^2)
Dist_Brier_sq <- as(Dist_Brier,'SpatialPixels')
Dist_Brier_sq$Dist_Brier_sq <- scale(Dist_Brier$Dist_Brier^2)
Dist_Quoddy_sq <- as(Dist_Quoddy,'SpatialPixels')
Dist_Quoddy_sq$Dist_Quoddy_sq <- scale(Dist_Quoddy$Dist_Quoddy^2)
Maxent_mod <- ppm(All_obs_ppp~log_Slope_im+log_Bathym_im +
I(ifelse(marks=='Brier',1,0)):Dist_Brier_im +
I(ifelse(marks=='Quoddy',1,0)):Dist_Quoddy_im +
I(ifelse(marks=='survey',1,0)):Dist_Survey_im +
marks,
data=list(log_Slope_im=log_Slope_im,
log_Bathym_im=log_Bathym_im,
Dist_Brier_im = as(as(Dist_Brier_sq,'SpatialGridDataFrame'),'im'),
Dist_Quoddy_im = as(as(Dist_Quoddy_sq,'SpatialGridDataFrame'),'im'),
Dist_Survey_im = as(as(Dist_Surveylines,'SpatialGridDataFrame'),'im'),
Domain=as(Domain, 'owin')),
subset = Domain)
coef(summary(Maxent_mod))
## Estimate S.E.
## (Intercept) -820.4577129 109.50202294
## log_Slope_im 0.4054561 0.06224574
## log_Bathym_im 0.1593891 0.07068249
## marksQuoddy 482.3514109 113.06812398
## markssurvey 547.9081580 119.51263528
## I(ifelse(marks == "Brier", 1, 0)):Dist_Brier_im -687.6788858 92.23298990
## I(ifelse(marks == "Quoddy", 1, 0)):Dist_Quoddy_im -245.1096937 20.37752447
## I(ifelse(marks == "survey", 1, 0)):Dist_Survey_im -643.2218903 117.20897712
## CI95.lo CI95.hi
## (Intercept) -1.035078e+03 -605.8376917
## log_Slope_im 2.834567e-01 0.5274555
## log_Bathym_im 2.085397e-02 0.2979242
## marksQuoddy 2.607420e+02 703.9608617
## markssurvey 3.136677e+02 782.1486188
## I(ifelse(marks == "Brier", 1, 0)):Dist_Brier_im -8.684522e+02 -506.9055474
## I(ifelse(marks == "Quoddy", 1, 0)):Dist_Quoddy_im -2.850489e+02 -205.1704797
## I(ifelse(marks == "survey", 1, 0)):Dist_Survey_im -8.729473e+02 -413.4965165
## Ztest Zval
## (Intercept) *** -7.492626
## log_Slope_im *** 6.513797
## log_Bathym_im * 2.255001
## marksQuoddy *** 4.266025
## markssurvey *** 4.584521
## I(ifelse(marks == "Brier", 1, 0)):Dist_Brier_im *** -7.455888
## I(ifelse(marks == "Quoddy", 1, 0)):Dist_Quoddy_im *** -12.028433
## I(ifelse(marks == "survey", 1, 0)):Dist_Survey_im *** -5.487821
With this rough attempt at controlling for heterogeneous effort, we see that both log slope and log depth are found to be significantly associated with the whale density, with positive coefficients for both. The negative effects of all three complicated expressions imply that a negative association between ‘distance’ and observed intensity is witnessed (agreeing with common sense).
We can plot the true whale intensity predicted from this model:
#dev.off()
# remove the effects of observers for plotting
pred_coef <- coef(Maxent_mod)
pred_coef[c(1,4,5,6,7,8)] <- 0
plot(predict.ppm(Maxent_mod, ngrid=c(300,300),
new.coef=pred_coef)[[1]],superimpose=F,
hcl.colors(60, "YlOrRd", rev = TRUE))
Alternatively, we can also plot the fitted observed intensity surface \(\lambda_{obs}\) (=\(\lambda_{true} \times \lambda_{effort}\)) for all observer types. We plot the observed intensity surfaces for both the survey and Quoddy-based whale watch vessels.
pred_coef <- coef(Maxent_mod)
plot(log(predict(Maxent_mod, ngrid=c(300,300),
new.coef=pred_coef)[[3]]),superimpose=F,
hcl.colors(60, "YlOrRd", rev = TRUE))
# Plot Quoddy
plot(log(predict(Maxent_mod, ngrid=c(300,300),
new.coef=pred_coef)[[2]]),superimpose=F,
hcl.colors(60, "YlOrRd", rev = TRUE))
This exploratory analysis is useful at informing us about possible species-environment relationships. By playing around with different covariates and functional forms (e.g. quadratic effects), we can develop an understanding of which models to formally compare in the modelling stage.
Unfortunately, by being in the inhomogeneous Poisson process class of point processes, the above models lack one crucial feature. They are unable to account for any additional spatial correlations/clustering that are frequently caused by unmeasured environmental covariates and biological processes acting behind the scenes.
We can actually investigate the suitability (or unsuitability) of this inhomogeneous Poisson process assumption by plotting the (estimated) inhomogeneous K functions. Loosely speaking, for a homogeneous Poisson process (with constant intensity) the K function evaluated at a distance r defines a (scalar multiple of) the expected number of extra events that occur within a distance \(r\) of a randomly chosen point (event). The K function can be evaluated exactly for homogeneous Poisson processes and takes the form \(K(r) = \pi r^2\). The empirical K function can also be computed from the data and compared with the true one. The empirical K function counts the observed number of inter-point distances (in the observed data) that fall within a sequence of distances \(r_1, r_2 ,..., r_m\). Plotting the observed (empirical) and expected (true) K functions together allows for model assumptions to be assessed.
If the empirical K function falls ‘significantly’ above the expected curve at a distance \(r\), then additional clustering has been detected in the data that cannot be explained by the homogeneous model alone. Conversely, if the empirical K function falls ‘significantly’ below the expected curve at a distance \(r\), then additional repulsion has been detected in the data that cannot be explained by the homogeneous Poisson model alone. This clustering or repulsion may be caused by covariates or biological processes that have been missed!
The K function and empirical K function can also be computed for inhomogeneous Poisson process (or MAXENT) models. The interpretation of the plots is similar, with large discrepancies providing evidence of additional unexplained clustering/repulsion existing in the data that may be caused by unmeasured covariates. Let’s plot these two curves now!
plot(Kinhom(All_obs_ppp,lambda=Maxent_mod,correction = 'best',normpower = 1))
## Warning: The behaviour of Kinhom when lambda is a ppm object has changed (in
## spatstat 1.37-0 and later). See help(Kinhom)
We see a large discrepancy between the true curve (in red) and the empirical curve in black. Thus, the fitted model in its current form is inadequate.
This is why we turn to log-Gaussian Cox processes in the next workshop. By assuming the intensity surface to be a realisation from a spatially-smooth (log-Gaussian) random field, we are able to adjust our inference to take into account any additional spatial correlations and/or unmeasured covariates! By doing this, both parameter estimates and measures of uncertainty should be better adjusted to account for these autocorrelations.